Data Analysis on FAANG Stocks from 2013 to 2020

CMSC 320 Final Project Tutorial

Authors: Harshit Raj | Vaibhav Khetan | Yash Kalyani

Project Description -

This project is a way for users to analyze previous and current stock data of the FAANG companies. In order to carry out a predictive, comparative and quantitative analysis of this data we have extracted relevant information from '.csv' files that have the required stock market information. We have imported various packages to help us carry out different functions which will make analyzing data in a more efficient way. We read the '.csv' files using pandas and extracted information into tables by splitting up the data as per the comma delimiters. Since, the FAANG companies had their IPOs established at different times which is why for the sake of consistency we chose to graph plots only after January, 2015. In this project we also saved the stock data in the form of a pandas dataframe, from these dataframes we pulled out the information that will be relevant in making graphs which can depict important stock information regarding the concerned company. Moreover, we have also tried to carry out hypothesis testing in order to check if all the data fits the line plots and scatter plots that have been made.

Motivation -

Today, stocks are extremely important; they play an integral role in determining the growth of a company and can also shape an individuals wealth. For a very long time people have been investing their money in buying shares of a company and also trading stocks. People exist, who have made a fortune by investing in companies that have grown exponentially in the last few years. However, stocks have had erratic patterns throughout the years, they have suffered during recessions and have also had setbacks due to certain internal or even external factors. Our group decided to work on some kind of stock analysis that would help us and other readers understand more about the stock market, and also be able to visualize the various attributes about a particular stock at a particular time. We also wanted to examine the change in trend in stocks during recessions (2001, 2008) and compare them to the trends today amidst a worldwide pandemic.

About Imports -

Here, we have imported all required packages and modules that python has to offer and those that will be useful in the project. We are primarily using pandas and pyplot from matplotlib. The functions in these modules provide us with functionality and accessibility that makes our code compact and also helps us to add interactive graphs making the final product visual and understandable by readers.

In [1]:
!pip3 install plotly==4.14.1
Collecting plotly==4.14.1
  Downloading plotly-4.14.1-py2.py3-none-any.whl (13.2 MB)
     |████████████████████████████████| 13.2 MB 8.6 MB/s eta 0:00:01
Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from plotly==4.14.1) (1.15.0)
Collecting retrying>=1.3.3
  Downloading retrying-1.3.3.tar.gz (10 kB)
Building wheels for collected packages: retrying
  Building wheel for retrying (setup.py) ... done
  Created wheel for retrying: filename=retrying-1.3.3-py3-none-any.whl size=11429 sha256=5905a076e16f69cd55bc0eeccbadc1fa69e281edd1acb6432c11bb15231a5fff
  Stored in directory: /home/jovyan/.cache/pip/wheels/c4/a7/48/0a434133f6d56e878ca511c0e6c38326907c0792f67b476e56
Successfully built retrying
Installing collected packages: retrying, plotly
Successfully installed plotly-4.14.1 retrying-1.3.3
In [2]:
import json
import plotly.figure_factory as ff
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as py
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=False)
In [3]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
import sklearn
import math
from datetime import datetime, date
from sklearn import preprocessing
from sklearn import datasets
from sklearn import utils
from sklearn import linear_model
from sklearn.metrics import *
from sklearn.preprocessing import *
from statsmodels.formula.api import ols
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
In [4]:
import pandas as pd
import numpy as np
facebook = pd.read_csv("data/Facebook.csv", sep=',')
apple = pd.read_csv("data/Apple.csv", sep=',')
amazon = pd.read_csv("data/Amazon.csv", sep=',')
netflix = pd.read_csv("data/Netflix.csv", sep=',')
google = pd.read_csv("data/Google.csv", sep=',')
In [5]:
facebook['Date'] = pd.to_datetime(facebook['Date'])
apple['Date'] = pd.to_datetime(apple['Date'])
amazon['Date'] = pd.to_datetime(amazon['Date'])
netflix['Date'] = pd.to_datetime(netflix['Date'])
google['Date'] = pd.to_datetime(google['Date'])
In [6]:
facebook = facebook[(facebook['Date'].dt.year > 2012) & (facebook['Date'].dt.year < 2021)]
apple = apple[(apple['Date'].dt.year > 2012) & (apple['Date'].dt.year < 2021)]
amazon = amazon[(amazon['Date'].dt.year > 2012) & (amazon['Date'].dt.year < 2021)]
netflix = netflix[(netflix['Date'].dt.year > 2012) & (netflix['Date'].dt.year < 2021)]
google = google[(google['Date'].dt.year > 2012) & (google['Date'].dt.year < 2021)]

facebook = facebook.reset_index(drop=True)
apple = apple.reset_index(drop=True)
amazon = amazon.reset_index(drop=True)
netflix = netflix.reset_index(drop=True)
google = google.reset_index(drop=True)

facebook
Out[6]:
Date Open High Low Close Adj Close Volume
0 2013-01-02 27.440001 28.180000 27.420000 28.000000 28.000000 69846400
1 2013-01-03 27.879999 28.469999 27.590000 27.770000 27.770000 63140600
2 2013-01-04 28.010000 28.930000 27.830000 28.760000 28.760000 72715400
3 2013-01-07 28.690001 29.790001 28.650000 29.420000 29.420000 83781800
4 2013-01-08 29.510000 29.600000 28.860001 29.059999 29.059999 45871300
... ... ... ... ... ... ... ...
1916 2020-08-12 258.970001 263.899994 258.109985 259.890015 259.890015 21428300
1917 2020-08-13 261.549988 265.160004 259.570007 261.299988 261.299988 17374000
1918 2020-08-14 262.309998 262.649994 258.679993 261.239990 261.239990 14792700
1919 2020-08-17 262.500000 264.100006 259.399994 261.160004 261.160004 13351100
1920 2020-08-18 260.950012 265.149994 259.260010 262.339996 262.339996 18677500

1921 rows × 7 columns

In [7]:
corr_df_fb = facebook[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_fb = corr_df_fb.pct_change()

corr_fb = retscomp_fb.corr()
corr_fb
Out[7]:
Open Close High Low Adj Close Volume
Open 1.000000 0.401758 0.769315 0.758697 0.401758 0.016311
Close 0.401758 1.000000 0.747093 0.732999 1.000000 0.007707
High 0.769315 0.747093 1.000000 0.790089 0.747093 0.192635
Low 0.758697 0.732999 0.790089 1.000000 0.732999 -0.178854
Adj Close 0.401758 1.000000 0.747093 0.732999 1.000000 0.007707
Volume 0.016311 0.007707 0.192635 -0.178854 0.007707 1.000000
In [8]:
fig = px.imshow(corr_fb)

fig.update_layout(title='Correlation between Features of Facebook Stock')

iplot(fig,show_link=False)
In [9]:
corr_df_ap = apple[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_ap = corr_df_ap.pct_change()

corr_ap = retscomp_ap.corr()
corr_ap
Out[9]:
Open Close High Low Adj Close Volume
Open 1.000000 0.413942 0.751016 0.768682 0.414086 -0.037703
Close 0.413942 1.000000 0.742652 0.735377 0.999469 -0.106651
High 0.751016 0.742652 1.000000 0.775474 0.741752 0.113300
Low 0.768682 0.735377 0.775474 1.000000 0.734981 -0.264364
Adj Close 0.414086 0.999469 0.741752 0.734981 1.000000 -0.107836
Volume -0.037703 -0.106651 0.113300 -0.264364 -0.107836 1.000000
In [10]:
fig = px.imshow(corr_ap)

fig.update_layout(title='Correlation between Features of Apple Stock')

iplot(fig,show_link=False)
In [11]:
corr_df_am = amazon[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_am = corr_df_am.pct_change()

corr_am = retscomp_am.corr()
corr_am
Out[11]:
Open Close High Low Adj Close Volume
Open 1.000000 0.423578 0.786375 0.747709 0.423578 0.044092
Close 0.423578 1.000000 0.747264 0.757656 1.000000 0.058062
High 0.786375 0.747264 1.000000 0.787892 0.747264 0.236255
Low 0.747709 0.757656 0.787892 1.000000 0.757656 -0.127467
Adj Close 0.423578 1.000000 0.747264 0.757656 1.000000 0.058062
Volume 0.044092 0.058062 0.236255 -0.127467 0.058062 1.000000
In [12]:
fig = px.imshow(corr_am)

fig.update_layout(title='Correlation between Features of Amazon Stock')

iplot(fig,show_link=False)
In [13]:
corr_df_ne = netflix[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_ne = corr_df_ne.pct_change()

corr_ne = retscomp_ne.corr()
corr_ne
Out[13]:
Open Close High Low Adj Close Volume
Open 1.000000 0.425437 0.749779 0.784565 0.425437 0.025012
Close 0.425437 1.000000 0.763005 0.728188 1.000000 0.123859
High 0.749779 0.763005 1.000000 0.774663 0.763005 0.278367
Low 0.784565 0.728188 0.774663 1.000000 0.728188 -0.109167
Adj Close 0.425437 1.000000 0.763005 0.728188 1.000000 0.123859
Volume 0.025012 0.123859 0.278367 -0.109167 0.123859 1.000000
In [14]:
fig = px.imshow(corr_ne)

fig.update_layout(title='Correlation between Features of Netflix Stock')

iplot(fig,show_link=False)
In [15]:
corr_df_go = google[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_go = corr_df_go.pct_change()

corr_go = retscomp_go.corr()
corr_go
Out[15]:
Open Close High Low Adj Close Volume
Open 1.000000 0.384602 0.766185 0.726269 0.384602 0.025639
Close 0.384602 1.000000 0.724499 0.745961 1.000000 0.012011
High 0.766185 0.724499 1.000000 0.800660 0.724499 0.178618
Low 0.726269 0.745961 0.800660 1.000000 0.745961 -0.129019
Adj Close 0.384602 1.000000 0.724499 0.745961 1.000000 0.012011
Volume 0.025639 0.012011 0.178618 -0.129019 0.012011 1.000000
In [16]:
fig = px.imshow(corr_go)

fig.update_layout(title='Correlation between Features of Google Stock')

iplot(fig,show_link=False)
In [17]:
facebook['Company'] = ['Facebook']*len(facebook)
apple['Company'] = ['Apple']*len(apple)
amazon['Company'] = ['Amazon']*len(amazon)
netflix['Company'] = ['Netflix']*len(netflix)
google['Company'] = ['Google']*len(google)

frames = [facebook, apple, amazon, netflix, google]

result = pd.concat(frames)
In [18]:
result['Year'] = np.arange(len(result.index))
result['Date'] = pd.to_datetime(result['Date'])

for x, rows in result.iterrows():
    result.loc[x, 'Year'] = rows['Date'].year

comp = result.groupby(['Company', 'Year'])

vol_df = pd.DataFrame()
vol = []
company = []
year = []

x = 0

for key,val in comp:
    a,b = key
    company.append(a)
    year.append(b)
    vol.append(comp.get_group(key).mean()['Volume'])

vol_df['Company'] = company
vol_df['Year'] = year
vol_df['Volume Mean'] = vol

fig = go.Figure()

avg_vol = vol_df['Volume Mean'].mean()
stand_vol = vol_df['Volume Mean'].std()

vol_df['standard_vol'] = np.arange(len(vol_df.index))
vol_df = vol_df.reset_index(drop=True)

for x, rows in vol_df.iterrows():
    vol_df.loc[x, 'standard_vol'] = (rows['Volume Mean'] - avg_vol)/(stand_vol)
In [19]:
avg_close = result.groupby('Date')['Close'].mean()
stand_close = result.groupby('Date')['Close'].std()

stand_close = stand_close.reset_index()
avg_close = avg_close.reset_index()

result['standard_close'] = np.arange(len(result.index))
result = result.reset_index(drop=True)

for x, rows in result.iterrows():
    result.loc[x, 'standard_close'] = (rows['Close'] - avg_close[avg_close['Date'] == rows['Date']]['Close']).values/(stand_close[stand_close['Date'] == rows['Date']]['Close']).values
    
result
Out[19]:
Date Open High Low Close Adj Close Volume Company Year standard_close
0 2013-01-02 27.440001 28.180000 27.420000 28.000000 28.000000 69846400.0 Facebook 2013 -0.663215
1 2013-01-03 27.879999 28.469999 27.590000 27.770000 27.770000 63140600.0 Facebook 2013 -0.665516
2 2013-01-04 28.010000 28.930000 27.830000 28.760000 28.760000 72715400.0 Facebook 2013 -0.659137
3 2013-01-07 28.690001 29.790001 28.650000 29.420000 29.420000 83781800.0 Facebook 2013 -0.661599
4 2013-01-08 29.510000 29.600000 28.860001 29.059999 29.059999 45871300.0 Facebook 2013 -0.661829
... ... ... ... ... ... ... ... ... ... ...
9610 2020-08-31 1643.569946 1644.500000 1625.329956 1629.530029 1629.530029 1321100.0 Google 2020 0.707107
9611 2020-09-01 1632.160034 1659.219971 1629.530029 1655.079956 1655.079956 1133800.0 Google 2020 0.707107
9612 2020-09-02 1668.010010 1726.099976 1660.189941 1717.390015 1717.390015 2476100.0 Google 2020 NaN
9613 2020-09-03 1699.520020 1700.000000 1607.709961 1629.510010 1629.510010 3180200.0 Google 2020 NaN
9614 2020-09-04 1609.000000 1634.989990 1537.970093 1581.209961 1581.209961 2792533.0 Google 2020 NaN

9615 rows × 10 columns

In [20]:
avg_14 = facebook.Close.rolling(window=14, min_periods=1).mean()
avg_21 = facebook.Close.rolling(window=21, min_periods=1).mean()
avg_100 = facebook.Close.rolling(window=100, min_periods=1).mean()
In [21]:
x_fb = facebook['Date']
y_fb = facebook['Open']
z_fb = facebook['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_fb, y=y_fb, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=z_fb, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='gold', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Facebook']['Volume Mean']/200000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Facebook']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Facebook from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume')

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [22]:
avg_14 = apple.Close.rolling(window=14, min_periods=1).mean()
avg_21 = apple.Close.rolling(window=21, min_periods=1).mean()
avg_100 = apple.Close.rolling(window=100, min_periods=1).mean()
In [23]:
x_ap = apple['Date']
y_ap = apple['Open']
z_ap = apple['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_ap, y=y_ap, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_ap, y=z_ap, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='gold', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Apple']['Volume Mean']/3500000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Apple']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Apple from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume')

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [24]:
avg_14 = amazon.Close.rolling(window=14, min_periods=1).mean()
avg_21 = amazon.Close.rolling(window=21, min_periods=1).mean()
avg_100 = amazon.Close.rolling(window=100, min_periods=1).mean()
In [25]:
x_am = amazon['Date']
y_am = amazon['Open']
z_am = amazon['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_am, y=y_am, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_am, y=z_am, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='gold', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Amazon']['Volume Mean']/2000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Amazon']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Amazon from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume')

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [26]:
avg_14 = netflix.Close.rolling(window=14, min_periods=1).mean()
avg_21 = netflix.Close.rolling(window=21, min_periods=1).mean()
avg_100 = netflix.Close.rolling(window=100, min_periods=1).mean()
In [27]:
x_ne = netflix['Date']
y_ne = netflix['Open']
z_ne = netflix['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_ne, y=y_ne, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_ne, y=z_ne, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='gold', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Netflix']['Volume Mean']/50000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Netflix']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Netflix from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume')

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [28]:
avg_14 = google.Close.rolling(window=14, min_periods=1).mean()
avg_21 = google.Close.rolling(window=21, min_periods=1).mean()
avg_100 = google.Close.rolling(window=100, min_periods=1).mean()
In [29]:
x_go = google['Date']
y_go = google['Open']
z_go = google['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_go, y=y_go, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_go, y=z_go, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='gold', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Google']['Volume Mean']/2000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Google']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Google from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume')

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [30]:
df_corr = pd.DataFrame()

df_corr['Facebook'] = facebook['Close']
df_corr['Apple'] = apple['Close']
df_corr['Amazon'] = amazon['Close']
df_corr['Netflix'] = netflix['Close']
df_corr['Google'] = google['Close']

retscomp = df_corr.pct_change()

corr = retscomp.corr()
corr
Out[30]:
Facebook Apple Amazon Netflix Google
Facebook 1.000000 0.444546 0.505884 0.345712 0.562611
Apple 0.444546 1.000000 0.431872 0.250707 0.522914
Amazon 0.505884 0.431872 1.000000 0.439284 0.601770
Netflix 0.345712 0.250707 0.439284 1.000000 0.413904
Google 0.562611 0.522914 0.601770 0.413904 1.000000
In [31]:
fig = px.imshow(corr)

fig.update_layout(title='Correlation between All FAANG Stocks Close Price')

iplot(fig,show_link=False)
In [32]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=facebook.Date, y=facebook.Close, name='FB'))
fig.add_trace(go.Scatter(x=apple.Date, y=apple.Close, name='AAPL'))
fig.add_trace(go.Scatter(x=amazon.Date, y=amazon.Close, name='AMZN'))
fig.add_trace(go.Scatter(x=netflix.Date, y=netflix.Close, name='NFLX'))
fig.add_trace(go.Scatter(x=google.Date, y=google.Close, name='GOOG'))

fig.update_layout(title='Close prices for All Companies from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Facebook',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False]},
                          {'title': 'FB',
                           'showlegend':True}]),
             dict(label = 'Apple',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False]},
                          {'title': 'APPL',
                           'showlegend':True}]),
             dict(label = 'Amazon',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False]},
                          {'title': 'AMZN',
                           'showlegend':True}]),
             dict(label = 'Netflix',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False]},
                          {'title': 'NFLX',
                           'showlegend':True}]),
             dict(label = 'Google',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True]},
                          {'title': 'GOOG',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [33]:
fig = px.line(result, x="Date", y="standard_close", color='Company')

fig.update_layout(title='Standardized Close prices for All Companies from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Standardized Close Price')

iplot(fig,show_link=False)
In [34]:
result['Year'] = np.arange(len(result.index))
result['Date'] = pd.to_datetime(result['Date'])

for x, rows in result.iterrows():
    result.loc[x, 'Year'] = rows['Date'].year

comp = result.groupby(['Company', 'Year'])

vol_df = pd.DataFrame()
vol = []
company = []
year = []

x = 0

for key,val in comp:
    a,b = key
    company.append(a)
    year.append(b)
    vol.append(comp.get_group(key).mean()['Volume'])

vol_df['Company'] = company
vol_df['Year'] = year
vol_df['Volume Mean'] = vol

fig = go.Figure()

avg_vol = vol_df['Volume Mean'].mean()
stand_vol = vol_df['Volume Mean'].std()

vol_df['standard_vol'] = np.arange(len(vol_df.index))
vol_df = vol_df.reset_index(drop=True)

for x, rows in vol_df.iterrows():
    vol_df.loc[x, 'standard_vol'] = (rows['Volume Mean'] - avg_vol)/(stand_vol)
    
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'], y=vol_df[vol_df['Company'] == 'Facebook']['standard_vol'], name='FB'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'], y=vol_df[vol_df['Company'] == 'Apple']['standard_vol'], name='AAPL'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'], y=vol_df[vol_df['Company'] == 'Amazon']['standard_vol'], name='AMZN'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'], y=vol_df[vol_df['Company'] == 'Netflix']['standard_vol'], name='NFLX'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'], y=vol_df[vol_df['Company'] == 'Google']['standard_vol'], name='GOOG'))

fig.update_layout(title='Standardized Volume for All Companies from Jan 2013 to Aug 2020 Grouped by Year',
                   xaxis_title='Date',
                   yaxis_title='Standard Volume')

iplot(fig,show_link=False)
In [35]:
facebook['timestamp'] = pd.to_datetime(facebook.Date).astype(int) // (10**9)
X = np.array(facebook['timestamp']).reshape(-1,1)
y = np.array(facebook['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Facebook from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [36]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.856
Model:                            OLS   Adj. R-squared (uncentered):              0.856
Method:                 Least Squares   F-statistic:                          1.145e+04
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        16:06:01   Log-Likelihood:                         -10322.
No. Observations:                1921   AIC:                                  2.065e+04
Df Residuals:                    1920   BIC:                                  2.065e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          8.612e-08   8.05e-10    107.007      0.000    8.45e-08    8.77e-08
==============================================================================
Omnibus:                      372.431   Durbin-Watson:                   0.003
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               76.141
Skew:                           0.017   Prob(JB):                     2.93e-17
Kurtosis:                       2.025   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [37]:
apple['timestamp'] = pd.to_datetime(apple.Date).astype(int) // (10**9)
X = np.array(apple['timestamp']).reshape(-1,1)
y = np.array(apple['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Apple from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [38]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.823
Model:                            OLS   Adj. R-squared (uncentered):              0.823
Method:                 Least Squares   F-statistic:                              8980.
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        16:06:01   Log-Likelihood:                         -8302.5
No. Observations:                1931   AIC:                                  1.661e+04
Df Residuals:                    1930   BIC:                                  1.661e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          2.599e-08   2.74e-10     94.764      0.000    2.54e-08    2.65e-08
==============================================================================
Omnibus:                      675.510   Durbin-Watson:                   0.002
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             2247.580
Skew:                           1.755   Prob(JB):                         0.00
Kurtosis:                       6.951   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [39]:
amazon['timestamp'] = pd.to_datetime(amazon.Date).astype(int) // (10**9)
X = np.array(amazon['timestamp']).reshape(-1,1)
y = np.array(amazon['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Amazon from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [40]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.716
Model:                            OLS   Adj. R-squared (uncentered):              0.716
Method:                 Least Squares   F-statistic:                              4845.
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        16:06:01   Log-Likelihood:                         -15159.
No. Observations:                1919   AIC:                                  3.032e+04
Df Residuals:                    1918   BIC:                                  3.033e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1           7.01e-07   1.01e-08     69.603      0.000    6.81e-07    7.21e-07
==============================================================================
Omnibus:                      180.785   Durbin-Watson:                   0.001
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              234.645
Skew:                           0.855   Prob(JB):                     1.12e-51
Kurtosis:                       2.908   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [41]:
netflix['timestamp'] = pd.to_datetime(netflix.Date).astype(int) // (10**9)
X = np.array(netflix['timestamp']).reshape(-1,1)
y = np.array(netflix['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Netflix from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [42]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.688
Model:                            OLS   Adj. R-squared (uncentered):              0.688
Method:                 Least Squares   F-statistic:                              4218.
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        16:06:02   Log-Likelihood:                         -11892.
No. Observations:                1910   AIC:                                  2.379e+04
Df Residuals:                    1909   BIC:                                  2.379e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          1.231e-07   1.89e-09     64.949      0.000    1.19e-07    1.27e-07
==============================================================================
Omnibus:                      396.692   Durbin-Watson:                   0.002
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              215.406
Skew:                           0.684   Prob(JB):                     1.68e-47
Kurtosis:                       2.087   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [43]:
google['timestamp'] = pd.to_datetime(google.Date).astype(int) // (10**9)
X = np.array(google['timestamp']).reshape(-1,1)
y = np.array(google['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Google from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [44]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.910
Model:                            OLS   Adj. R-squared (uncentered):              0.910
Method:                 Least Squares   F-statistic:                          1.965e+04
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        16:06:02   Log-Likelihood:                         -13599.
No. Observations:                1934   AIC:                                  2.720e+04
Df Residuals:                    1933   BIC:                                  2.721e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          5.899e-07   4.21e-09    140.167      0.000    5.82e-07    5.98e-07
==============================================================================
Omnibus:                      284.325   Durbin-Watson:                   0.003
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              108.466
Skew:                           0.374   Prob(JB):                     2.80e-24
Kurtosis:                       2.114   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [45]:
df = facebook[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Facebook For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [46]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor accuracy score: " + str(confidenceknn))
print("Decision Tree Regressor accuracy score: " + str(confidencetree))
print("Linear Regressor accuracy score: " + str(confidencelr))
k-NN Regressor accuracy score: 0.9032003738135771
Decision Tree Regressor accuracy score: 0.8562651241732164
Linear Regressor accuracy score: 0.8577460290447015
In [47]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.951
Model:                            OLS   Adj. R-squared (uncentered):              0.951
Method:                 Least Squares   F-statistic:                          2.776e+04
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        16:06:02   Log-Likelihood:                         -7050.3
No. Observations:                1421   AIC:                                  1.410e+04
Df Residuals:                    1420   BIC:                                  1.411e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.3322      0.008    166.613      0.000       1.317       1.348
==============================================================================
Omnibus:                      225.808   Durbin-Watson:                   0.015
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              146.928
Skew:                          -0.669   Prob(JB):                     1.24e-32
Kurtosis:                       2.167   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [48]:
df = apple[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Apple For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [49]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor accuracy score: " + str(confidenceknn))
print("Decision Tree Regressor accuracy score: " + str(confidencetree))
print("Linear Regressor accuracy score: " + str(confidencelr))
k-NN Regressor accuracy score: 0.8167324803975498
Decision Tree Regressor accuracy score: 0.7570995593812778
Linear Regressor accuracy score: 0.7210480558319772
In [50]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.956
Model:                            OLS   Adj. R-squared (uncentered):              0.956
Method:                 Least Squares   F-statistic:                          3.100e+04
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        16:06:02   Log-Likelihood:                         -5329.4
No. Observations:                1431   AIC:                                  1.066e+04
Df Residuals:                    1430   BIC:                                  1.067e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.5272      0.009    176.058      0.000       1.510       1.544
==============================================================================
Omnibus:                      177.603   Durbin-Watson:                   0.015
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              282.447
Skew:                           0.851   Prob(JB):                     4.65e-62
Kurtosis:                       4.357   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [51]:
df = amazon[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Amazon For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [52]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor accuracy score: " + str(confidenceknn))
print("Decision Tree Regressor accuracy score: " + str(confidencetree))
print("Linear Regressor accuracy score: " + str(confidencelr))
k-NN Regressor accuracy score: 0.9564441290417927
Decision Tree Regressor accuracy score: 0.9335955304988008
Linear Regressor accuracy score: 0.8667269486538978
In [53]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.962
Model:                            OLS   Adj. R-squared (uncentered):              0.962
Method:                 Least Squares   F-statistic:                          3.589e+04
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        16:06:02   Log-Likelihood:                         -9985.0
No. Observations:                1419   AIC:                                  1.997e+04
Df Residuals:                    1418   BIC:                                  1.998e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.7599      0.009    189.435      0.000       1.742       1.778
==============================================================================
Omnibus:                      168.223   Durbin-Watson:                   0.017
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              296.057
Skew:                          -0.776   Prob(JB):                     5.15e-65
Kurtosis:                       4.611   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [54]:
df = netflix[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Netflix For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [55]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor accuracy score: " + str(confidenceknn))
print("Decision Tree Regressor accuracy score: " + str(confidencetree))
print("Linear Regressor accuracy score: " + str(confidencelr))
k-NN Regressor accuracy score: 0.8264777911830616
Decision Tree Regressor accuracy score: 0.714807077931656
Linear Regressor accuracy score: 0.6100702876947401
In [56]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.871
Model:                            OLS   Adj. R-squared (uncentered):              0.871
Method:                 Least Squares   F-statistic:                              9556.
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        16:06:02   Log-Likelihood:                         -8359.2
No. Observations:                1410   AIC:                                  1.672e+04
Df Residuals:                    1409   BIC:                                  1.673e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.6538      0.017     97.752      0.000       1.621       1.687
==============================================================================
Omnibus:                       30.404   Durbin-Watson:                   0.009
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               61.684
Skew:                          -0.055   Prob(JB):                     4.03e-14
Kurtosis:                       4.019   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [57]:
df = google[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))


predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Google For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price')

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [58]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor accuracy score: " + str(confidenceknn))
print("Decision Tree Regressor accuracy score: " + str(confidencetree))
print("Linear Regressor accuracy score: " + str(confidencelr))
k-NN Regressor accuracy score: 0.9354846674304177
Decision Tree Regressor accuracy score: 0.8916308694033612
Linear Regressor accuracy score: 0.8891073069975874
In [59]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.989
Model:                            OLS   Adj. R-squared (uncentered):              0.989
Method:                 Least Squares   F-statistic:                          1.296e+05
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        16:06:02   Log-Likelihood:                         -8729.4
No. Observations:                1434   AIC:                                  1.746e+04
Df Residuals:                    1433   BIC:                                  1.747e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.3352      0.004    360.017      0.000       1.328       1.343
==============================================================================
Omnibus:                       27.027   Durbin-Watson:                   0.044
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               30.071
Skew:                          -0.288   Prob(JB):                     2.95e-07
Kurtosis:                       3.414   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [ ]: